24 research outputs found

    Using hierarchical information-theoretic criteria to optimize subsampling of extensive datasets

    Get PDF
    This paper addresses the challenge of subsampling large datasets, aiming to generate a smaller dataset that retains a significant portion of the original information. To achieve this objective, we present a subsampling algorithm that integrates hierarchical data partitioning with a specialized tool tailored to identify the most informative observations within a dataset for a specified underlying linear model, not necessarily first-order, relating responses and inputs. The hierarchical data partitioning procedure systematically and incrementally aggregates information from smaller-sized samples into new samples. Simultaneously, our selection tool employs Semidefinite Programming for numerical optimization to maximize the information content of the chosen observations. We validate the effectiveness of our algorithm through extensive testing, using both benchmark and real-world datasets. The real-world dataset is related to the physicochemical characterization of white variants of Portuguese Vinho Verde. Our results are highly promising, demonstrating the algorithm's capability to efficiently identify and select the most informative observations while keeping computational requirements at a manageable level

    Randomizing a clinical trial in neuro-degenerative disease

    Get PDF
    The paper studies randomization rules for a sequential two-treatment, two-site clinical trial in Parkinson鈥檚 disease. An important feature is that we have values of responses and five potential prognostic factors from a sample of 144 patients similar to those to be enrolled in the trial. Analysis of this sample provides a model for trial analysis. The comparison of allocation rules is made by simulation yielding measures of loss due to imbalance and of potential bias. A major novelty of the paper is the use of this sample, via a two-stage algorithm, to provide an empirical distribution of covariates for the simulation; sampling of a correlated multivariate normal distribution is followed by transformation to variables following the empirical marginal distributions. Six allocation rules are evaluated. The paper concludes with some comments on general aspects of the evaluation of such rules and provides a recommendation for two allocation rules, one for each site, depending on the target number of patients to be enrolled

    A model-based framework assisting the design of vapor-liquid equilibrium experimental plans

    Get PDF
    In this paper we propose a framework for Model-based Sequential Optimal Design of Experiments to assist experimenters involved in Vapor-Liquid equilibrium characterization studies to systematically construct thermodynamically consistent models. The approach uses an initial continuous optimal design obtained via semidefinite programming, and then iterates between two stages (i) model fitting using the information available; and (ii) identification of the next experiment, so that the information content in data is maximized. The procedure stops when the number of experiments reaches the maximum for the experimental program or the dissimilarity between the parameter estimates during two consecutive iterations is below a given threshold. This methodology is exemplified with the D-optimal design of isobaric experiments, for characterizing binary mixtures using the NRTL and UNIQUAC thermodynamic models for liquid phase. Significant reductions of the confidence regions for the parameters are achieved compared with experimental plans where the observations are uniformly distributed over the domain

    Optimal design of experiments for liquid鈥搇iquid equilibria characterization via semidefinite programming

    Get PDF
    Liquid鈥搇iquid equilibria (LLE) characterization is a task requiring considerable work and appreciable financial resources. Notable savings in time and effort can be achieved when the experimental plans use the methods of the optimal design of experiments that maximize the information obtained. To achieve this goal, a systematic optimization formulation based on Semidefinite Programming is proposed for finding optimal experimental designs for LLE studies carried out at constant pressure and temperature. The non-random two-liquid (NRTL) model is employed to represent species equilibria in both phases. This model, combined with mass balance relationships, provides a means of computing the sensitivities of the measurements to the parameters. To design the experiment, these sensitivities are calculated for a grid of candidate experiments in which initial mixture compositions are varied. The optimal design is found by maximizing criteria based on the Fisher Information Matrix (FIM). Three optimality criteria (D-, A- and E-optimal) are exemplified. The approach is demonstrated for two ternary systems where different sets of parameters are to be estimated
    corecore